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Abstract — The replica method is a non-rigorous but widely- 
accepted technique from statistical physics used in the asymptotic 
analysis of large, random, nonlinear problems. This paper applies 
the replica method to non-Gaussian maximum a posteriori (MAP) 
estimation. It is shown that with random linear measurements 
and Gaussian noise, the asymptotic behavior of the MAP estimate 
of an n-dimensional vector "decouples" as n scalar MAP esti- 
mators. The result is a counterpart to Guo and Verdu's replica 
analysis of minimum mean-squared error estimation. 

The replica MAP analysis can be readily applied to many esti- 
mators used in compressed sensing, including basis pursuit, lasso, 
linear estimation with thresholding, and zero norm-regularized 
estimation. In the case of lasso estimation the scalar estimator 
reduces to a soft-thresholding operator, and for zero norm- 
regularized estimation it reduces to a hard-threshold. Among 
other benefits, the replica method provides a computationally- 
tractable method for exactly computing various performance 
metrics including mean-squared error and sparsity pattern re- 
covery probability. 

Index Terms — Compressed sensing, Laplace's method, large 
deviations, least absolute shrinkage and selection operator (lasso), 
nonlinear estimation, non-Gaussian estimation, random matrices, 
sparsity, spin glasses, statistical mechanics, thresholding 



I. Introduction 
Estimating a vector x e M" from measurements of the form 



cE>x + w, 



(1) 



where $ G jjmxn j-gpresents a known measurement matrix 
and w e M™ represents measurement errors or noise, is a 
generic problem that arises in a range of circumstances. One 
of the most basic estimators for x is the maximum a posteriori 
(MAP) estimate 



^mapj-y^ = argmaxp. 



x|y 



(x|y), 



(2) 



which is defined assuming some prior on x. For most priors, 
the MAP estimate is nonlinear and its behavior is not easily 
characterizable. Even if the priors for x and w are separable, 
the analysis of the MAP estimate may be difficult since the 
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matrix $ couples the n unknown components of x with the 
m measurements in the vector y. 

The primary contribution of this paper is to show that with 
certain large random "3> and Gaussian w, there is an asymptotic 
decoupling of ([T]l into n scalar MAP estimation problems. 
Each equivalent scalar problem has an appropriate scalar prior 
and Gaussian noise with an effective noise level. The analysis 
yields the asymptotic joint distribution of each component Xj 
of X and its corresponding estimate Xj in the MAP estimate 
vector x™^P(y). From the joint distribution, various further 
computations can be made, such as the mean-squared error 
(MSE) of the MAP estimate or the error probability of a 
hypothesis test computed from the MAP estimate. 

The analysis can quantify the effect of using a postulated 
prior different from the true prior. This has two important 
consequences: First, for many priors, the exact MAP estimate 
is computationally intractable; one can use our method to 
determine the asymptotic performance when using an ap- 
proximate prior that simplifies computations. Second, when 
MAP is not the criterion of interest, many popular estimation 
algorithms can be seen as MAP estimators with respect to a 
postulated prior. This is the case for the basis pursuit and lasso 
estimators used in compressed sensing. 

A. Replica Method 

Our analysis is based on a powerful but non-rigorous 
technique from statistical physics known as the replica method. 
The replica method was originally developed by Edwards and 
Anderson [1] to study the statistical mechanics of spin glasses. 
Although not fully rigorous from the perspective of probability 
theory, the technique was able to provide explicit solutions for 
a range of complex problems where many other methods had 
previously failed. Indeed, the replica method and related ideas 
from statistical mechanics have found success in a number 
of classic NP-hard problems including the traveling salesman 
problem [2], graph partitioning [3], fc-SAT [4] and others [5]. 
Statistical physics methods have also been applied to the study 
of error correcting codes [6], [7]. 

The replica method was first applied to the study of nonlin- 
ear MAP estimation problems by Tanaka [8]. He considered 
multiuser detection for large CDMA systems with random 
spreading sequences. Miiller [9] considered a mathematically- 
similar problem for MIMO communication systems. In the 
context of the estimation problem considered here, Tanaka's 
and Midler's papers essentially characterized the behavior of 
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the MAP estimator of a vector x with i.i.d. binary components 
observed through Unear measurements of the form ([T]i with a 
large random $ and Gaussian w. 

Tanaka's results were then generalized in a remarkable paper 
by Guo and Verdu [10] to vectors x with arbitrary distribu- 
tions. Guo and Verdii's result was also able to incorporate 
a large class of minimum postulated MSE estimators, where 
the estimator may assume a prior that is different from the 
actual prior. The result in this paper is the corresponding MAP 
statement to Guo and Verdu's result. In fact, our result is 
derived from Guo and Verdu's by taking appropriate limits 
with large deviations arguments. 

The non-rigorous aspect of the replica method involves a 
set of assumptions that include a self-averaging property, the 
validity of a "replica trick," and the ability to exchange certain 
limits. Some progress has been made in formally proving these 
assumptions; a survey of this work can be found in [11]. 
Also, some of the predictions of the replica method have been 
validated rigorously by other means. For example, Montanari 
and Tse [12] have confirmed Tanaka's formula in certain 
regimes using density evolution and belief propagation. 

To emphasize our dependence on these unproven assump- 
tion, we will refer to Guo and Verdu's result as the Replica 
MMSE Claim. Our main result, which depends on Guo and 
Verdu's analysis, will be called the Replica MAP Claim. 

B. Applications to Compressed Sensing 

As an application of our main result, we will develop a 
few analyses of estimation problems that arise in compressed 
sensing [13]-[15]. In compressed sensing, one estimates a 
sparse vector x from random linear measurements. A vector 
X is sparse when its number of nonzero entries k is smaller 
than its length n. Generically, optimal estimation of x with a 
sparse prior is NP-hard [16]. Thus, most attention has focused 
on greedy heuristics such as matching pursuit [17]-[20] and 
convex relaxations such as basis pursuit [21] or lasso [22]. 
While successful in practice, these algorithms are difficult to 
analyze precisely. 

Compressed sensing of sparse x through ([T]i (using inner 
products with rows of $) is mathematically identical to 
sparse approximation of y with respect to columns of $. 
An important set of results for both sparse approximation and 
compressed sensing are the deterministic conditions on the co- 
herence of $ that are sufficient to guarantee good performance 
of the suboptimal methods mentioned above [23]-[25]. These 
conditions can be satisfied with high probability for certain 
large random measurement matrices. Compressed sensing has 
provided many sufficient conditions that are easier to satisfy 
than the initial coherence-based conditions. However, despite 
this progress, the exact performance of most sparse estimators 
is still not known precisely, even in the asymptotic case of 
large random measurement matrices. Most results describe the 
estimation performance via bounds, and the tightness of these 
bounds is generally not known. 

There are, of course, notable exceptions including [26] 
and [27] which provide matching necessary and sufficient 
conditions for recovery of strictly sparse vectors with basis 
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pursuit and lasso. However, even these results only consider 
exact recovery and are limited to measurements that are noise- 
free or measurements with a signal-to-noise ratio (SNR) that 
scales to infinity. 

Many common sparse estimators can be seen as MAP 
estimators with certain postulated priors. Most importantly, 
lasso and basis pursuit are MAP estimators assuming a Lapla- 
cian prior. Other commonly-used sparse estimation algorithms, 
including linear estimation with and without thresholding and 
zero norm-regularized estimators, can also be seen as MAP- 
based estimators. For these MAP-based sparse estimation 
algorithms, we can apply the replica method to provide a novel 
analysis with a number of important features: 

« Asymptotic exactness: Most importantly, the replica 
method provides — under the assumption of the replica 
hypotheses — not just bounds, but the exact asymptotic 
behavior of MAP-based sparse estimators. This in turns 
permits exact expressions for the various performance 
metrics such as MSE or fraction of support recovery. The 
expressions apply for arbitrary ratios k/n, n/m and SNR. 

• Connections to thresholding: The scalar model provided 
by the Replica MAP Claim is appealing in that it reduces 
the analysis of a complicated vector-valued estimation 
problem to a simple equivalent scalar model. This model 
is particularly simple for lasso estimation. In this case, the 
replica analysis shows that the asymptotic behavior of the 
lasso estimate of any component of x is equivalent to that 
component being corrupted by Gaussian noise and soft- 
thresholded. Similarly, zero norm-regularized estimation 
is equivalent to hard thresholding. 

• Application to arbitrary distributions: The replica analy- 
sis can incorporate arbitrary distributions on x including 
several sparsity models, such as Laplacian, generalized 
Gaussian and Gaussian mixture priors. Discrete distribu- 
tions can also be studied. 

It should be pointed out that this work is not the first 
to use ideas from statistical physics for the study of sparse 
estimation. Merhav, Guo and Shamai [28] consider, among 
other applications, the estimation of a sparse vector x, through 
measurements of the form y = x + w. In their model, 
there is no measurement matrix such as $ in ([T]), but the 
components of x are possibly correlated. Their work derives 
explicit expressions for the MMSE as a function of the 
probability distribution on the number of nonzero components. 
The analysis does not rely on replica assumptions and is 
fully rigorous. More recently, Kabashima, Wadayama and 
Tanaka [29] have used the replica method to derive precise 
conditions on which sparse signals can be recovered with 
£p-based relaxations such as lasso. Their analysis does not 
consider noise, but can find conditions on recovery on the 
entire vector x, not just individual components. 

C. Outline 

The remainder of the paper is organized as follows. The 
precise estimation problem is described in Section HI] We 
review the Replica MMSE Claim of Guo and Verdu in 
Section [III] We then present our main result, the Replica 
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MAP Claim, in Section |IV] The results are applied to the 
analysis of compressed sensing algorithms in Section|Vj which 
is followed by numerical simulations in Section [VT] Future 
work is given in Section IVIII The proof of the main result is 
somewhat long and given in a set of appendices; Appendix U 
provides an overview of the proof and a guide through the 
appendices with detailed arguments. 



II. Estimation Problem and Assumptions 

Consider the estimation of a random vector x e M" from 
linear measurements of the form 

y = $x + w = AS^/2x + w, (3) 

where y e M™ is a vector of observations, $ = AS^^^, A e 
jljmxn ^ measurement matrix, S is a diagonal matrix of 
positive scale factors. 



S = diag(si, . . . ,s„) , 



Sj > 0, 



(4) 



and w e R™ is zero-mean, white Gaussian noise. We consider 
a sequence of such problems indexed by n, with n ^ oo. For 
each n, the problem is to determine an estimate x of x from 
the observations y knowing the measurement matrix A and 
scale factor matrix S. 

The components Xj of x are modeled as zero mean and 
i.i.d. with some prior probability distribution po{xj). The per- 
component variance of the Gaussian noise is Ejwjp = ctq. 
We use the subscript "0" on the prior and noise level to 
differentiate these quantities from certain "postulated" values 
to be defined later. When we develop applications, the prior 
Po{xj) will incorporate presumed sparsity of the components 
of X. 

In (|3]l, we have factored $ ~ AS^/^ so that even with 
the i.i.d. assumption on xjs above and an i.i.d. assumption on 
entries of A, the model can capture variations in powers of 
the components of x that are known a priori at the estimator 
Specifically, multiplication by S^/^ scales the variance of the 
jth component of x by a factor Sj . Variations in the power of 
X that are not known to the estimator should be captured in 
the distribution of x. 

We summarize the situation and make additional assump- 
tions to specify the problem precisely as follows: 

(a) The number of measurements m = m{n) is a determin- 
istic quantity that varies with n and satisfies 

lim n/m{n) — (3 

n — ^oo 

for some (3 > 0. (The dependence of m on n is usually 
omitted for brevity.) 

(b) The components Xj of x are i.i.d. with probability distri- 
bution po{xj). 

(c) The noise w is Gaussian with w ^ Af{0, (jQlm)- 

(d) The components of the matrix A are i.i.d. zero mean with 
variance 1/m. 

(e) The scale factors Sj are i.i.d. and satisfy Sj > almost 
surely. 

(f) The scale factor matrix S, measurement matrix A, vector 
X and noise w are all independent. 



III. Review of the Replica MMSE Claim 

We begin by reviewing the Replica MMSE Claim of Guo 
and Verdu [10]. 

A. Minimum Postulated MSB Estimators 

The Replica MMSE Claim concerns the asymptotic behav- 
ior of estimators that minimize MSE under certain postulated 
prior distributions. To define the concept, suppose one is given 
a "postulated" prior distribution Ppost and a postulated noise 
level (Tpost that may be different from the true values pq 
and (Tq. We define the minimum postulated MSE (MPMSE) 
estimate of x as 

X— (y) = E(x|y;ppo,t,a^„J 

= J xpx|y(x I y ; Ppost, cTpo^J dx, (5) 

where px|y(x | y; 9,cr^) is the conditional distribution of x 
given y under the x distribution and noise variance specified 
as parameters after the semicolon. We will use this sort of 
notation throughout the rest of the paper, including the use of p 
without a subscript for the p.d.f. of the scalar or vector quantity 
understood from context. In this case, due to the Gaussianity 
of the noise, we have 

Px|y(x I y ; q,(T^) 

= C-iexp(^-^||y-ASi/2x||2^q(x), (6) 
where the normalization constant is 

Jexp (-2^l|y - ASi/2x||2^ g(x) dx 
and g(x) is the joint p.d.f. 

n 

^(x) = ^'^(^J)■ 
In the case when Ppost — Po and cTp^gj = CTq, so that 
the postulated and true values agree, the MPMSE estimator 
reduces to the true MMSE estimate. 

B. Replica MMSE Claim 

The essence of the Replica MMSE Claim is that the 
asymptotic behavior of the MPMSE estimator is described 
by an equivalent scalar estimator. Let q{x) be a probability 
distribution defined on some set A" C R. Given /x > 0, let 
Px\z{x \ z ; q, ii) he the conditional distribution 

Px\z{x \ z; q,fi) 



where 



z — X ] j.i)q(x) dx 

xex 

) is the Gaussian distribution 
1 



[z-x; tj)q{x) (7) 



; p) 



t,|V(2M) 



(8) 



The distribution p^^z{x\z ; /x) is the conditional distribution 
of the scalar random variable x ^ q{x) from an observation 
of the form 

z = x + ^v, (9) 
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-AA(0,1) 



X ^ po{x) ■ 



4y 



-■p-eff 



/s 



^sc3ar( ' iPpostiMp) 



Fig. 1. Equivalent scalar model for the estimator behavior predicted by the 
Replica MMSE Claim. 



where v ~ A/'(0, 1). Using this distribution, we can define the 
scalar conditional MMSE estimate, 



that component is then described by the (generally nonlinear) 
scalar estimator x{z ; ppost, Mp)- 

The effective noise levels cr^g and cr'^^^g are described by 
the solutions to fixed-point equations ( fl3] l. Note that cr^jf and 
o'p^cS ^PPC^ implicitly on the left- and right-hand sides of 
these equations via the terms fi and fj,p. In general, there 
is no closed form solution to these equations. However, the 
expectations can be evaluated via numerical integration. 

It is important to point out that there may, in general, be 
multiple solutions to the fixed-point equations ( fTSl ). In this 
case, it turns out that the true solution is the minimizer of a 
certain Gibbs' function described in [10]. 



■^scalar i 9: M) 



xpx\z{x\z ; fj.) dx. 



(10) 



Also, given two distributions, pq{x) and pi{x), and two noise 
levels, /io > and jii > 0, define 

mse(pi,po,Aii,Mo,z) 

k - Pll^^l)?Px\z{x I z] po,fio)dx, (1 



C. Effective Noise and Multiuser Efficiency 

To understand the significance of the effective noise level 
cr^ff, it is useful to consider the following estimation problem 
with side information. Suppose that when estimating the 
component Xj an estimator is given as side information the 
values of all the other components xi, £ ^ j. Then, this 
hypothetical estimator with side information can "subtract out" 
l^the effect of all the known components and compute 



which is the MSE in estimating the scalar x from the variable 
z in (|9l) when x has a true distribution x ~ Po{x) and the 
noise level is /i = /io, but the estimator assumes a distribution 
X ~ Pi{x) and noise level ji = jii. 

Replica MMSE Claim [10]: Consider the estimation prob- 
lem in Section in Let impmso^y) ^j^g mPMSE estimator 
based on a postulated prior ppost and postulated noise level 
(TpQst. For each n, let j — j{n) be some deterministic 
component index with j(n) e {!,..., n}. Then there exist 
effective noise levels a^g and <Jp_gg such that: 

(a) As n ^ oo, the random vectors (xj, Sj, i™^™^'^) con- 
verge in distribution to the random vector (x, s, x) shown 
in Fig. [T] Here, x, s, and v are independent with x ^ 

Po{x), s ~ ps(s), V ~ Af{0, 1), and 



a, y 



E 



seSLiXe 



where ag is the £th column of the measurement matrix A. It 
is easily checked that 

1 



(14) 



where 



X 

z 



^scalar ( ^ ! Ppost , Mp ) (12a) 

x + Y^ij (12b) 



where pL = a\gl s and [i^ = cr^^^g/s. 
(b) The effective noise levels satisfy the equations 

f^cff = (To +/3E[smse(ppost,Po,Aip,M,2)] (13a) 

_2 



C'p-eff 



post 



-I-/3E [s mse(ppost,Ppost, /^p, /ip, z)] (13b) 



Thus, (fT4] i shows that with side information, estimation of Xj 
reduces to a scalar estimation problem where Xj is corrupted 
by additive noise Vj . Since w is Gaussian with mean zero and 
per-component variance cr^, vj is Gaussian with mean zero and 
variance l/||aj|p. Also, since is an m-dimensional vector 
whose components are i.i.d. with variance 1/m, ||aj|p — > 1 
as TO — > cx). Therefore, for large to, Vj will approach Vj ^ 
Ar(0,l). 

Comparing ( fT4b with ( I12bl ). we see that the equivalent scalar 
model predicted by the Replica MMSE Claim in ( I12bl ) is 
identical to the estimation with perfect side information ( fT4b . 
except that the noise level is increased by a factor 



where the expectations are taken over s ~ Ps{s) and z 
generated by ( |12bl l. 

The Replica MMSE Claim asserts that the asymptotic 
behavior of the joint estimation of the rt-dimensional vector 
X can be described by n equivalent scalar estimators. In 
the scalar estimation problem, a component x ^ Po{x) is 
corrupted by additive Gaussian noise yielding a noisy measure- 
ment z. The additive noise variance is n — (j'^g/s, which is the 
effective noise divided by the scale factor s. The estimate of 



1/77 = /i//io 



-r2 7^2 



(15) 



In multiuser detection, the factor 77 is called the multiuser 
efficiency [30], [31]. 

The multiuser efficiency can be interpreted as degradation 
in the effective signal-to-noise ratio (SNR): With perfect side- 
information, an estimator using zj in (fT4l i can estimate xj with 
an effective SNR of 



1 



SNRo(s) = — ElxjT ^ ^E|xj 



Mo 



(16) 
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In CDMA multiuser detection, the factor SNRo(s) is called the 
post-despreading SNR with no multiple access interference. 
The Replica MMSE Claim shows that without side informa- 
tion, the effective SNR is given by 



|2 S 



SNR(s) = fE|x,f = -^E|a;,p. 



(17) 



Therefore, the multiuser efficiency 77 in (fTsT i is the ratio of the 
effective SNR with and without perfect side information. 

D. Replica Assumptions 

As described in Section II-AI the Replica MMSE Claim is 
not formally proven. We introduce the following definition 
to call attention to when we are explicitly assuming that the 
Replica MMSE Claim holds. 

Definition 1: Consider the estimation problem in Section Ull 
A postulated prior Ppost and noise level a^^^^ are said to satisfy 
the Replica MMSE Claim when the corresponding postulated 
MMSE estimator x™?™*'" satisfies properties (a) and (b) of the 
Replica MMSE Claim. 

IV. Replica MAP Claim 

We now turn to MAP estimation. Let A" C M be some 
(measurable) set and consider an estimator of the form 

n 

x"-P(y) = argmin-||y- ASi/2x||2 + (18) 

where 7 > is an algorithm parameter and f : X ^ M. 
is some scalar-valued, non-negative cost function. We will 
assume that the objective function in (fTSl l has a unique 
essential minimizer for almost all y. 

The estimator ( fTSl l can be interpreted as a MAP estimator. 
To see this, suppose that for u sufficiently large. 



dx < 00, 



(19) 



where we have overloaded the notation /( • ) such that 



/(x) = ^/(x,). 

When ( fT9] l is satisfied, we can define the prior probability 
distribution 

-1 



P«(x) = 
Also, let 



cxp(— u/(x)) fix 



cr„ = 7/u. 

Substituting ( |20l l and dlTT i into Q, we see that 

Px|y(x I y ; p„,crD 



cxp(-?//(x)). (20) 
(21) 



Cu exp 



-u\^\\y~AS''^^\\ 



(22) 



for some constant C„ that does not depend on x. (The scaling 
of the noise variance along with p„ enables the factorization 
in the exponent of (|22]|.) Comparing to ( fTSl l. we see that 

= argmaxpx|y(x | y ; cr^). 

xGA"" 



Thus for all sufficiently large u, we indeed have a MAP 
estimate — assuming the prior and noise level u^. 

To analyze this MAP estimator, we consider a sequence of 
MMSE estimators. For each u, let 



X"(y) =E(x|y;p„,a2), 



(23) 



which is the MMSE estimator of x under the postulated prior 
Pu in (|20] i and noise level cr^ in ( 1211 1. Using a standard 
large deviations argument, one can show that under suitable 
conditions 

lim X"(y) =i"-P(y) 

u — ^00 

for all y. A formal proof is given in Appendix |lll] (see 
Lemma |4]i. Under the assumption that the behaviors of the 
MMSE estimators are described by the Replica MMSE Claim, 
we can then extrapolate the behavior of the MAP estimator 
This will yield our main result, the Replica MAP Claim. 
To state the claim, define the scalar MAP estimator 



where 



Cafar(^; ^) = arg min z, A) 



F{x,z,X) = l.\z-x\'' + fix). 



(24) 



(25) 



The estimator ( |24] | plays a similar role as the scalar MMSE 
estimator (fTOl l. 

The Replica MAP Claim pertains to the estimator (fTSl l 
applied to the sequence of estimation problems defined in 
Section Our assumptions are as follows: 

Assumption 1: For all u > sufficiently large, assume the 
postulated prior p„ in ( |20] | and noise level cr^ in (ISTT i satisfy 
the Replica MMSE Claim (see Definition [TJ- 

Assumption 2: Let cr^ff(u) and a'^_^g{u) be the effective 
noise levels when using the prior p„ and noise level cr^. 
Assume the following limits exist: 

^ u — '■00 

7p = lim ual_^g{u). 
Assumption 3: Suppose for each n, x^{n) is the MMSE 
estimate of the component Xj for some index j S {1, . . . , n} 
based on the postulated prior pu and noise level cr^. Then, 
assume that limits can be interchanged to give the following 
equality: 

lim lim x^{n) — lim lim x^{n), 



u — 'oo n — *oo 



n — *oo u — 'oo 



where the limits are in distribution. 

Assumption 4: For every n. A, and S, assume that for 
almost all y, the minimization in ( fTSI l achieves a unique 
essential minimum. Here, essential should be understood in 
the standard measure theoretic sense in that the minimum and 
essential infimum agree. 

Assumption 5: Assume that f{x) is non-negative and satis- 
fies 

hm — r = 00, 

l^l^oo log l^l 

where the limit must hold over all sequences x ^ X with 
^ 00. If X is compact, this limit is automatically satisfied 
(since there are no sequences in X with \x\ ^ 00). 
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w ~ A/'(0,1) 

= 7p/s 





■^scalar V ' 









Fig. 2. Equivalent scalar model for the estimator behavior predicted by the 
RepUca MAP Claim. 



Assumption 6: For all A G K and almost all z, the mini- 
mization in ( l24b has a unique, essential minimum. Moreover, 
for all A and almost all z, there exists a a''^{z, A) such that 

1^ _ 



lim — ; ^ : -- 

x^x 2{F{x, z, A) - F{x, z, A)) 

- map 



fT'(^,A), (27) 



scala 



(z; A). 



where x 

Assumption [T] is simply stated to again point out that we are 
assuming the Replica MMSE Claim is valid. As discussed in 
Section HITDI the RepUca MMSE Claim has not been formally 
proven. We make the additional Assumptions |2]-3] which are 
also difficult to verify but similar in spirit. Taken together. 
Assumptions [T]-2] reflect the main limitations of the replica 
analysis and precisely state the manner in which the analysis 
is non-rigorous. 

Assumptions |5] and |6] are technical conditions on the exis- 
tence and uniqueness of the MAP estimate. Unlike Assump- 
tions[T}21 we will verify Assumptions|5]and|6]for the problems 
of interest. In fact, we will explicitly calculate (t^(z, A). 

We can now state our extension of the Replica MMSE Claim 
to MAP estimation. 

Replica MAP Claim: Consider the estimation problem in 
Section In] Let x"'''P(y) be flie MAP estimator ([TS]) defined 
for some f{x) and 7 > satisfying Assumptions [T]-|6] For 
each n, let j — j{n) be some deterministic component index 
with j{n) E {1, . . . , n}. Then: 

(a) As n 00, the random vectors (xj, Sj,x^J^^^) converge 
in distribution to the random vector (x, s, x) shown in 
Fig. |2] for the limiting effective noise levels cr^g and -fp 
in Assumption |2] Here, x, s, and v are independent with 

X ~ po{x), s ^ ps{s), V ~ A^(0, 1), and 



''scalar(-^' ^p) 
X + y/JIv, 



(28a) 
(28b) 



where fi = (r^sM^e^p/s and Ap = jp/s. 
(b) The limiting effective noise levels cr^fj ^^^^p and 7p satisfy 
the equations 



f^cff,map = Crl+(3E[s\x 

7p = 7 + /3E [scr2(z,Ap)] 

where the expectations are taken over x ~ Po{^)^ s 
Ps{s), and v ~ A/'(0, 1), with x and z defined in 



(29a) 
(29b) 



Analogously to the Replica MMSE Claim, the Replica MAP 
Claim asserts that asymptotic behavior of the MAP estimate of 



any single component of x is described by a simple equivalent 
scalar estimator In the equivalent scalar model, the component 
of the true vector x is corrupted by Gaussian noise and the 
estimate of that component is given by a scalar MAP estimate 
of the component from the noise-corrupted version. 

V. Analysis of Compressed Sensing 

Our results thus far hold for any separable distribution for x 
(see Sectioninil and under mild conditions on the cost function 
/ (see especially Assumption |5] but other assumptions also 
implicitly constrain /). In this section, we provide additional 
details on replica analysis for choices of / that yield MAP 
estimators relevant to compressed sensing. Since the role of / 
is to determine the estimator, this is not the same as choosing 
sparse priors for x. Numerical evaluations of asymptotic 
performance with sparse priors for x are given in Section [Vll 

A. Linear Estimation 

We first apply the Replica MAP Claim to the simple case 
of linear estimation. Linear estimators only use second-order 
statistics and generally do not directly exploit sparsity or other 
aspects of the distribution of the unknown vector x. Nonethe- 
less, for sparse estimation problems, linear estimators can be 
used as a first step in estimation, followed by thresholding or 
other nonlinear operations [32], [33]. It is therefore worthwhile 
to analyze the behavior of linear estimators even in the context 
of sparse priors. 

The asymptotic behavior of linear estimators with large ran- 
dom measurement matrices is well known. For example, using 
the Marcenko-Pastur theorem [34], Verdu and Shamai [35] 
characterized the behavior of linear estimators with large 
i.i.d. matrices A and constant scale factors S = /. Tse 
and Hanly [36] extended the analysis to general S. Guo and 
Verdu [10] showed that both of these results can be recovered 
as special cases of the general Replica MMSE Claim. We 
show here that the Replica MAP Claim can also recover 
these results. Although this analysis will not provide any new 
results, walking through the computations will illustrate how 
the Replica MAP Claim is used. 

To simplify the notation, suppose that the true prior on x 
is such that each component has zero mean and unit variance. 
Choose the cost function 

which corresponds to the negative log of a Gaussian prior also 
with zero mean and unit variance. With this cost function, the 
MAP estimator ( fTSl l reduces to the linear estimator 



-map^y) ^ S^/^A' (ASA' + 7/) ^ y. 



(30) 



When 7 = (Tq, the true noise variance, the estimator ( |30] | is 
the linear MMSE estimate. 

Now, let us compute the effective noise levels from the 
Replica MAP Claim. First note that F{x, z, A) in (l25T l is given 
by 



Fix,z,X) 



1 
2A 
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and therefore the scalar MAP estimator in ( l24l ) is given by 

1 

calar 



(z; A) 



1 + A 



(31) 



A simple calculation also shows that (7^(z, A) in dZTl l is given 

A 

a2(z,A) = ^. (32) 

Now, as part (a) of the Replica MAP Claim, let ji ~ 
c^off,map/s and Ap = jp/s. Observe that 

^map / _ , -y |2 



E [s\x 

(a) 
(b) 



E 



calar (-^ ' \^)\ 
1 



l + A, 



E 



1 + Ap 



(c) s{Xl + fl) 



(33) 



where (a) follows from (ISTT i; (b) follows from (I28bl i: and (c) 
follows from the fact that x and v are uncorrelated with zero 
mean and unit variance. Substituting ( |32T l and ( [33T l into the 
fixed-point equations ( |29] l. we see that the limiting noise levels 



(T 



cff ,map 



and 7p must satisfy 



^cff ,map 



/3E 



7p = 7 + /3E 
where the expectation is over s ' 



(1 + Ap)2 
.sA„ 



In the case when 
7 = (Tq, it can be verified that a solution to these fixed-point 



1 + Ap 
Vs{s). 



equations is <ff_^ap 



7p, which results in /i 



A„ and 



cff , map 



-^E 



1 + A„ 



scr, 



cff , map 



cff , map 



(34) 



The expression ( |34] | is precisely the Tse-Hanly formula [36] 
for the effective interference. Given a distribution on s, this 
expression can be solved numerically for In the 

special case of constant s, ( |34] | reduces to Verdii and Shamai's 
result in [37] and can be solved via a quadratic equation. 

The Replica MAP Claim now states that for any component 
index j, the asymptotic joint distribution of {xj,Sj,Xj) is 
described by xj corrupted by additive Gaussian noise with 
variance a^g ^^^1 s followed by a scalar linear estimator 

As described in [10], the above analysis can also be applied 
to other linear estimators including the matched filter (where 
7 oo) or the decorrelating receiver (7 — > 0). 

B. Lasso Estimation 

We next consider lasso estimation, which is widely used 
for estimation of sparse vectors. The lasso estimate [22] 
(sometimes referred to as basis pursuit denoising [21]) is given 
by 



^lasso(y) 



arg mm 

xGR" 27 



7^l|y-ASi/2x| 



(35) 



where 7 > is an algorithm parameter The estimator is 
essentially a least-squares estimator with an additional ||x||i 
regularization term to encourage sparsity in the solution. 
The parameter 7 is selected to trade off the sparsity of the 
estimate with the prediction error. An appealing feature of 
lasso estimation is that the minimization in ( l35b is convex; 
lasso thus enables computationally-tractable algorithms for 
finding sparse estimates. 

The lasso estimator ( |35] ) is identical to the MAP estimator 
( fTSl ) with the cost function 

f{x) = \x\. 

With this cost function, F{x, z, A) in (l25l l is given by 

F(x,z,A) - ^\z-x\^ + \xl 
and therefore the scalar MAP estimator in (l24b is given by 

c:l(- ; A) =Tr(-), (36) 

where T^°^*(z) is the soft thresholding operator 



z — A, if z > A; 
Tl°'\z)={ 0, if|z|<A; 

z + A, if z < —A. 



(37) 



The Replica MAP Claim now states that there exists effec- 
tive noise levels cr^fj ^^^^p and 7p such that for any component 
index j, the random vector {xj, Sj,Xj) converges in distribu- 
tion to the vector {x,s,x) where x ~ po{x), s ~ ps{s), and 
X is given by 



(38) 



where v ^ 7V(0, 1), Ap = 7p/s, and ^ = cr^ff^map/^- Hence, 
the asymptotic behavior of lasso has a remarkably simple 
description: the asymptotic distribution of the lasso estimate 
Xj of the component Xj is identical to Xj being corrupted by 
Gaussian noise and then soft-thresholded to yield the estimate 

Xj. 

This soft-threshold description has an appealing interpreta- 
tion. Consider the case when the measurement matrix A = I. 
In this case, the lasso estimator ( l35l l reduces to n scalar 
estimates. 



' _ rpsoit 



{Xj 



J = 1, 2, 



. ,n, 



(39) 



where Vi ^ A/'(0, 1), A — 7/s, and ^0 ~ ctq/s. Comparing 
( l38l ) and (l39l ). we see that the asymptotic distribution of 
{xj , Sj , Xj ) with large random A is identical to the distribution 
in the trivial case where A = I, except that the noise levels 7 
and dg are replaced by effective noise levels 7p and (T^jf ^^p. 

To calculate the effective noise levels, one can perform a 
simple calculation to show that cr^(z, A) in ( l27l) is given by 



Hence, 



E [sa'^iz,Xp)] 



A, if |z| > A; 
0, if|z|<A. 



E[sApPr(|z| > Ap)] 
7pPr(|z| > 7p/s), 



(40) 



(41) 
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where we have used the fact that Ap = Ip/ s. Substituting d36T l 
and (|4TT | into ( |29] l. we obtain the fixed-point equations 



Ip = 7 + /37pPr(NI > 7p/s), 



(42a) 
(42b) 



where the expectations are taken with respect to a; ^ Po{x), 
s ^ Ps{s), and z in ( |38] l. Again, while these fixed-point 
equations do not have a closed-form solution, they can be 
relatively easily solved numerically given distributions of x 
and s. 



C. Zero Norm-Regularized Estimation 

Lasso can be regarded as a convex relaxation of zero norm- 
regularized estimation 



'(y) = argmin^^lly- AS^/^xl 



|x||o, 



(43) 



where ||x||o is the number of nonzero components of x. For 
certain strictly sparse priors, zero norm-regularized estimation 
may provide better performance than lasso. While computing 
the zero norm-regularized estimate is generally very difficult, 
we can use the replica analysis to provide a simple character- 
ization of its performance. This analysis can provide a bound 
on the achievable performance by practical algorithms. 

To apply the Replica MAP Claim to the zero norm- 
regularized estimator (|43] |. we observe that the zero norm- 
regularized estimator is identical to the MAP estimator ( fTSl l 
with the cost function 



0, if a; = 0; 

1, if a; 7^ 0. 



(44) 



Technically, this cost function does not satisfy the conditions 
of the Rephca MAP Claim. For one thing, without bounding 
the range of x, the bound ( fT9] l is not satisfied. Also, the 
minimum of (l24l l does not agree with the essential infimum. 
To avoid this problem, we can consider an approximation of 

ii, 

" 0, if |a;| < 5] 
1, if |a:| G [,5,M], 



which is defined on the set X — {x : \x\ < M}. We can then 
take the limits 5^0 and M ^ oo. For space considerations 
and to simplify the presentation, we will just apply the Replica 
MAP Claim with f{x) in ( |44] | and omit the details in taking 
the appropriate limits. 

With f{x) given by ( l44b . the scalar MAP estimator in (l24l i 
is given by 



^map 
'^scalar 



(z; A)=T^'-d(z), t^V2\, 



where T^^'^'^ is the hard thresholding operator. 



jnhard^^-) 



0, 



if 
if 



< t. 



(45) 



(46) 



Now, similar to the case of lasso estimation, the Replica MAP 
Claim states there exists effective noise levels a^g and 
7p such that for any component index j, the random vector 



[Xj , Sj ,x j) converges in distiibution to the vector {x,s,x) 
where x ^ po{x), s ~ ps(s), and x is given by 



where v - 7V(0, 1), Xp = 7p/s, ^l = CTeff.map/*' and 



(47) 



(48) 



Thus, the zero norm-regularized estimation of a vector x is 
equivalent to n scalar components corrupted by some effective 
noise level cr^jf ^^^^ and hard-thresholded based on a effective 
noise level jp. 

The fixed-point equations for the effective noise levels 
(T^ff ^^^p and 7p can be computed similarly to the case of lasso. 
Specifically, one can verify that ( l40b and (|4TI) are both satisfied 
for the hard thresholding operator as well. Substituting (|4T]) 
and ( l45T l into ( |29] l, we obtain the fixed-point equations 



^eff ,map 

7p 



/3E [s 



hard 



= 7 + /37pPr(|;2| >t), 



(49a) 
(49b) 



where the expectations are taken with respect to x ~ poix), 
s ~ ps{s), z in Wh . and t given by ( |48] |. These fixed-point 
equations can be solved numerically. 



D. Optimal Regularization 

The lasso estimator ( |35] | and zero norm-regularized estima- 
tor ( |43] | require the setting of a regularization parameter 7. 
Qualitatively, the parameter provides a mechanism to trade 
off the sparsity level of the estimate with the prediction error. 
One of the benefits of the replica analysis is that it provides 
a simple mechanism for optimizing the parameter level given 
the problem statistics. 

Consider first the lasso estimator (l35l l with some (3 > and 
distributions x ^ Pq{x) and s ^ Ps{s)- Observe that there 
exists a solution to ( I42bb with 7 > if and only if 



Pr(|z| >7pAs) < 1//3. 



(50) 



This leads to a natural optimization: we consider an optimiza- 
tion over two variables cr^jf and 7p, where we minimize 
f^cff.map subject to ( I42al i and ( ISUb . 

One simple procedure for performing this minimization is 
as follows: We start with t — and some initial value of 



(0). For any iteration t > 0, we update o-^s manW 



cff ,map 

with the minimization 

'^off,map(^ + l) = '^o+/3minE 

7p 



s\x ■ 



(51) 



where, on the right-hand side, the expectation is taken over 

X - Po{x), S ~ ps{s), Z in ((381), = croff,map(^)/*' ™d Ap = 

7p/s. The minimization in ( BTl i is over 7p > subject to ( l50b . 
One can show that with a sufficiently high initial condition, 
the sequence o-^s mapO-) monotonically decreases to a local 
minimum of the objective function. Given the final value for 
7p, one can then recover 7 from ( I42bb . A similar procedure 
can be used for the zero norm-regularized estimator. 
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Fig. 3. MSB performance prediction witli tlie Replica MAP Claim. Plotted 
is the median nomialized SE for various sparse recovery algorithms: Hnear 
MMSE estimation, lasso, zero norm-regularized estimation, and optimal 
MMSE estimation. Solid lines show the asymptotic predicted MSE from 
the Replica MAP Claim. For the linear and lasso estimators, the circles and 
triangles show the actual median SE over 1000 Monte Carlo simulations. 
The unknown vector has i.i.d. Bernoulli-Gaussian components with a 90% 
probability of being zero. The noise level is set so that SNRq = 10 dB. See 
text for details. 



VI. Numerical Simulation 
A. Bernoulli-Gaussian Mixture Distribution 

As discussed above, the replica method is based on certain 
unproven assumptions and even then is only an asymptotic 
result for the large dimension limit. To validate the predictive 
power of the Replica MAP Claim for finite dimensions, we 
first performed numerical simulations where the components 
of X are a zero-mean Bernoulli-Gaussian process, or equiv- 
alently a two-component, zero-mean Gaussian mixture where 
one component has zero variance. Specifically, 



AA(0,1), 
0, 



with prob. p; 
with prob. 1 



where p represents a sparsity ratio. In the experiments, p ~ 
0.1. This is one of many possible sparse priors. 

We took the vector x to have n = 100 i.i.d. components 
with this prior, and we varied m for 10 different values of 
(3 = n/m from 0.5 to 3. For the measurements (|3]l, we took 
a measurement matrix A with i.i.d. Gaussian components and 
a constant scale factor matrix S = /. The noise level ctq was 
set so that SNRq = 10 dB, where SNRo is the signal-to-noise 
ratio with perfect side information defined in (fT6] l. 

We simulated various estimators and compared their perfor- 
mances against the asymptotic values predicted by the replica 
analysis. For each value of /3, we performed 1000 Monte 
Carlo trials of each estimator. For each trial, we measured 
the normalized squared error (SE) in dB 



lOlogi 



where x is the estimate of x. The results are shown in 
Fig. [3] with each set of 1000 trials represented by the median 
normalized SE in dB. 

The top curve shows the performance of the linear MMSE 
estimator ( l30l l. As discussed in Section [V-AI the Replica MAP 
Claim applied to the case of a constant scale matrix S = / 
reduces to Verdii and Shamai's result in [37]. As can be seen 
in Fig. [3] the result predicts the simulated performance of the 
linear estimator extremely well. 

The next curve shows the lasso estimator ( [35] l with the factor 
7 selected to minimize the MSE as described in Section IV-DI 
To compute the predicted value of the MSE from the Replica 
MAP Claim, we numerically solve the fixed-point equations 
(l42b to obtain the effective noise levels cr^jf and 7p. We 



then use the scalar MAP model with the estimator ( 1361 ) to 
predict the MSE. We see from Fig. [3] that the predicted MSE 
matches the median SE within 0.3 dB over a range of (3 values. 
We believe that this level of accuracy in predicting lasso's 
performance is not achievable with any other method. 

Fig. [3] also shows the theoretical minimum MSE (as com- 
puted by the Replica MMSE Claim) and the theoretical MSE 
from the zero norm-regularized estimator as computed in 
Section IV-CI For these two cases, the estimators cannot be 
simulated since they involve NP-hard computations. But we 
have depicted the curve to show that the replica method can 
be used to calculate the gap between practical and impractical 
algorithms. Interestingly, we see that there is about a 2 to 2.5 
dB gap between lasso and zero norm-regularized estimation, 
and another 1 to 2 dB gap between zero norm-regularized 
estimation and optimal MMSE. 

It is, of course, not surprising that zero norm-regularized 
estimation performs better than lasso for the strictly sparse 
prior considered in this simulation, and that optimal MMSE 
performs better yet. However, what is valuable is that replica 
analysis can quantify the precise performance differences. 

In Fig. [3] we plotted the median SE since there is actually 
considerable variation in the SE over the random realizations 
of the problem parameters. To illustrate the degree of vari- 
ability. Fig. m shows the CDF of the SE values over the 1000 
Monte Carlo trials. We see that there is large variation in the 
SE, especially at the smaller dimension n — 100. While the 
median value agrees well with the theoretical replica limit, any 
particular instance of the problem can vary considerably from 
that limit. This is perhaps the most significant drawback of 
the replica method: at lower dimensions, the replica method 
may provide accurate predictions of the median behavior, but 
does not bound the variations from the median. 

As one might expect, at the higher dimension of n = 
500, the level of variability is reduced and the observed SE 
begins to concentrate around the replica limit. In his original 
paper [8], Tanaka assumes that concentration of the SE will 
occur; he calls this the self-averaging assumption. Fig. |4] 
provides some empirical evidence that self-averaging does 
indeed occur. However, even at n = 500, the variation is 
not insignificant. As a result, caution should be exercised in 
using the replica predictions on particular low-dimensional 
instances. 
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Fig. 4. Convergence to the Replica MAP limit. Plotted are the CDFs of 
the SE over 1000 Monte Carlo trials of the lasso method for the Gaussian 
mixture distribution. Details are in the text. The CDF is shown for dimensions 
n = 100 and n = 500 and (3 = 1 and 2. As vector dimension increases, the 
performance begins to concentrate around the limit predicted by the Replica 
MAP Claim. 

B. Discrete Distribution with Dynamic Range 

The Replica MAP Claim can also be used to the study 
the effects of dynamic range in power levels. To validate the 
replica analysis with power variations, we ran the following 
experiment: the vector x was generated with i.i.d. components 

Xj — y/SjUj, (52) 

where Sj is a random power level and Uj is a discrete three- 
valued random variable with probability mass function 

{l/y/p, with prob ^ p/2\ 
— with prob = p/2; (53) 

0, with prob = 1 — p. 

As before, the parameter p represents the sparsity ratio and we 
chose a value of p = 0.1. The measurements were generated 
by 

y = Ax + w = AS^/^u + w, 

where A is an i.i.d. Gaussian measurement matrix and w 
is Gaussian noise. As in the previous section, the post- 
despreading SNR with side-information was normalized to 10 
dB. 

The factor Sj in ( [52] | accounts for power variations in Xj. 
We considered two random distributions for Sj-. (a) Sj = 1, so 
that the power level is constant; and (b) Sj is uniform (in dB 
scale) over a 10 dB range with average unit power. 

In case (b), when there is variation in the power levels, we 
can analyze two different scenarios for the lasso estimator: 
• Power variations unknown: If the power level Sj in ( |52] | is 
unknown to the estimator, then we can apply the standard 
lasso estimator: 

x(y) = argmin-!-||y- Ax||2 + ||x||i, (54) 

xGR" 27 



-2 
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Measurement ratio p = n/m 

Fig. 5. MSE peiformance prediction by the replica method of the lasso 
estimator with power variations in the components. Plotted is the median 
MSE of the lasso method in estimating a discrete-valued distribution. Three 
scenarios are considered: (a) All components are the same power; (b) the 
components have a 10 dB range in power that is unknown to the estimator 
and (c) the power range is known to the estimator and incorporated into 
the measurement matrix. Solid lines represent the Replica MAP asymptotic 
prediction and the circles, triangles, and squares show the median MSE over 
1000 Monte Carlo simulation. See text for details. 

which does not need knowledge of the power levels Sj. 
To analyze the behavior of this estimator with the replica 
method, we simply incorporate variations of both Uj and 
Sj into the prior of Xj and assume a constant scale factor 
s in the replica equations. 
• Power variations known: If the power levels Sj are 
known, the estimator can compute 

u(y) =argmin^||y-ASi/2u|l2 + |lu|li (55) 

and then take x = S^/^u. This can be analyzed with 
the replica method by incorporating the distribution of Sj 
into the scale factors. 
Fig. |5] shows the performance of the lasso estimator for the 
different power range scenarios. As before, for each j3, the 
figure plots the median SE over 1000 Monte Carlo simulation 
trials. Fig.|5]also shows the theoretical asymptotic performance 
as predicted by the Replica MAP Claim. Simulated values are 
based on a vector dimension of n = 100 and optimal selection 
of 7 as described in Section [V-DI 

We see that in all three cases (constant power and power 
variations unknown and known to the estimator), the replica 
prediction is in excellent agreement with the simulated perfor- 
mance. With one exception, the replica method matches the 
simulated performance within 0.2 dB. The one exception is 
for /3 = 2.5 with constant power, where the replica method 
underpredicts the median MSE by about 1 dB@. A simulation 
at a higher dimension of n = 500 (not shown here) reduced 
this discrepancy to 0.2 dB, suggesting that the replica method 
is still asymptotically correct. 

We can also observe two interesting phenomena in Fig. |5] 
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First, the lasso method's performance with constant power 
is almost identical to the performance with unknown power 
variations for values of /5 < 2. However, at higher values 
of (3, the power variations actually improve the performance 
of the lasso method, even though the average power is the 
same in both cases. Wainwright's analysis [26] demonstrated 
the significance of the minimum component power in dictating 
lasso's performance. The above simulation and the correspond- 
ing replica predictions suggest that dynamic range may also 
play a role in the performance of lasso. That increased dy- 
namic range can improve the performance of sparse estimation 
has been observed for other estimators [38], [39]. 

A second phenomena we see in Fig. |5]is that knowing the 
power variations and incorporating them into the measurement 
matrix can actually degrade the performance of lasso. Indeed, 
knowing the power variations appears to result in a 1 to 2 dB 
loss in MSB performance. 

Of course, one cannot conclude from this one simulation 
that these effects of dynamic range hold more generally. The 
study of the effect of dynamic range is interesting and beyond 
the scope of this work. The point is that the replica method 
provides a simple analytic method for quantifying the effect 
of dynamic range that appears to match actual performance 
well. 

C. Support Recovery with Thresholding 

In estimating vectors with strictly sparse priors, one im- 
portant problem is to detect the locations of the nonzero 
components in the vector x. This problem, sometimes called 
support recovery, arises for example in subset selection in 
linear regression [40], where finding the support of the vector 
X corresponds to determining a subset of features with strong 
linear influence on some observed data y. Several works have 
attempted to find conditions under which the support of a 
sparse vector x can be fully detected [26], [33], [41] or 
partially detected [42]-[45]. Unfortunately, with the exception 
of [26], the only available results are bounds that are not tight. 

One of the uses of the Replica MAP claim is to exactly 
predict the fraction of support that can be detected correctly. 
To see how to predict the support recovery performance, 
observe that the Replica MAP Claim provides the asymptotic 
joint distribution for the vector {xj,Sj,Xj), where Xj is the 
component of the unknown vector, sj is the corresponding 
scale factor and ij is the component estimate. Now, in support 
recovery, we want to estimate 9j, the indicator function that 
Xj is non-zero 

r 1, if X, ^ 0; 
^ ~ \ 0, if Xj ^ 0. 

One natural estimate for dj is to compare the magnitude of 
the component estimate xj to some some scale-dependent 
threshold t{sj), 

^ ^ r 1, if \xj \ > t{sj); 
' 1 0, if |%| <t(s,), 

This idea of using thresholding for sparsity detection has 
been proposed in [32] and [46]. Using the joint distribution 



, f 




/ A / 

<> a/ 




i 


Linear+thresholding (replica) . 

o Linear+thresholding (sim.) 

Lasso+thresholding (replica) 

Lasso+threshodling (sim.) 



0.5 1 1.5 2 2.5 3 

Measurement ratio p = n/m 



Fig. 6. Support recovery performance prediction with the replica method. The 
soUd lines show the theoretical probability of error in sparsity misdetection 
using linear and lasso estimation followed by optimal thresholding. The circles 
and triangles are the corresponding mean probabilities of misdetection over 
1000 Monte Carlo trials. 

{xj,Sj,Xj), one can then compute the probability of sparsity 
misdetection 

The probability of error can be minimized over the threshold 
levels t{s). 

To verify this calculation, we generated random vectors x 
with n = 100 i.i.d. components given by ( |52] l and f53[ . We 
used a constant power {sj = 1) and a sparsity fraction of 
p = 0.2. As before, the observations y were generated with 
an i.i.d. Gaussian matrix with SNRq = 10 dB. 

Fig. |6] compares the theoretical probability of sparsity mis- 
detection predicted by the replica method against the actual 
probability of misdetection based on the average of 1000 
Monte Carlo trials. We tested two algorithms: linear MMSE 
estimation and lasso estimation. For lasso, the regularization 
parameter was selected for minimum MMSE as described in 
Section IV-DI The results show a good match. 

VII. Conclusions and Future Work 

We have applied the replica method from statistical physics 
for computing the asymptotic performance of MAP estimation 
of non-Gaussian vectors with large random linear measure- 
ments. The method can be readily applied to problems in 
compressed sensing. While the method is not theoretically 
rigorous, simulations show an excellent ability to predict the 
performance for a range of algorithms, performance metrics, 
and input distributions. Indeed, we believe that the replica 
method provides the only method to date for asymptotically- 
exact prediction of performance of compressed sensing algo- 
rithms that can apply in a large range of circumstances. 

Moreover, we believe that the availability of a simple scalar 
model that exactly characterizes certain sparse estimators 
opens up numerous avenues for analysis. For one thing, it 
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would be useful to see if the replica analysis of lasso can 
be used to recover the scaling laws of Wainwright [26] and 
Donoho and Tanner [27] for support recovery and to extend 
the latter to the noisy setting. Also, the best known bounds 
for MSE performance in sparse estimation are given by Haupt 
and Nowak [47] and Candes and Tao [48]. Since the replica 
analysis is asymptotically exact, we may be able to obtain 
much tighter analytic expressions. In a similar vein, several 
researchers have attempted to find information-theoretic lower 
bounds with optimal estimation [33], [41], [49]. Using the 
replica analysis of optimal estimators, one may be able to 
improve these scaling laws as well. 

Finally, there is a well-understood connection between sta- 
tistical mechanics and belief propagation-based decoding of 
error correcting codes [6], [7]. These connections may suggest 
improved iterative algorithms for sparse estimation as well. 

Appendix I 
Proof Overview 

Fix a deterministic sequence of indices j — j{n) with 
j{n) S {!,..., n}. For each n, define the random vector 
triples 

6l"(n) = ixj{n),Sj{n),x'^{n)), (56a) 
(?--P(n) = (a;,(n),s,(n),x7^P(n)), (56b) 

where Xj{n), and are the jth components of 

the random vectors x, x"(y), and x'"'*P(y), and Sj{n) is the 
jth diagonal entry of the matrix S. 
For each u, we will use the notation 



Appendix IIIII 



calar 



(z; A) 



x-!;j-(z;p„,A/w), (57) 



where pu is defined in ( |20] | and i™aiar(^' ') defined in 
([Tol l. Also, for every <t and 7 > 0, define the random vectors 



9rcaiar(^ ,7) = (a^, s, x^',,!,, (z ; 7/s)) , (58a) 

ginap 



WarK,7) - (x,S,C:L.(^;7/s)), (58b) 

where x and s are independent with x ^ Po{x), s ~ Ps{s), 
and 

a 

z — X -\ (59) 

Vs 

with V - M{0, 1). 

Now, to prove the Replica MAP Claim, we need to show 
that (under the stated assumptions) 

jim 0-^P(n) = C:L('^cff,map, 7p), (60) 

where the limit is in distribution and the noise levels cr^g ^^^p 
and 7p satisfy part (b) of the claim. This desired equivalence 
is depicted in the right column of Fig. |7] 

To show this limit we first observe that under Assumption [T] 
for u sufficiently large, the postulated prior distribution pu (x) 
in (I20I 1 and noise level cr^ in (l2Tl l are assumed to satisfy the 
Replica MMSE Claim. Satisfying the RepUca MMSE Claim 
implies that 

lim {xj{n),Sj{n),x^{n)) 

= {x,s,x—{z;p^,al_^si^)/s)), (61) 



Replica 
MMSE 
Claim 



*scalar(z; 7/s) ■ 



Replica 
MAP 
Claim 



Appendix HV] ^ 



Fig. 7. The Replica MAP Claim of this paper relates a)™^^(ra) to 

^scafar('^' ^Z*) through an n — > oo limit. We establish the equivalence 
of its validity to the validity of the Replica MMSE Claim [10] thi'ough two 
« — » oo limits: Appendix IIIII relates a;"(n) and x™^^(n); Appendix IIVI 



where the limit is in distribution, x ~ ^0(2:), s ~ Ps{s), and 



z = x + 



?j -A/'(0,1). 



Using the notation above, we can rewrite this limit as 

lim 0"{n) ="* \im {xj{n),Sj{n),x'^{n)) 



(d) 



{x,s,x'^^{^^{z-pu,al_^f^{u)/s)) 

(2;:S:^scalar(^; "O'p-off (")/«)) 
Calar ('^cff(«),W^p-cff(")), (62) 



where all the limits are in distribution and (a) follows from 
the definition of in (I56ab : (b) follows from (1611 1: (c) 

follows from (l57l and (d) follows from (I58ab . This equivalence 
is depicted in the left column of Fig. [T] 

The key part of the proof is to use a large deviations 
argument to show that for almost all y, 

lim X"(y)=i"-P(y). 

u — ^00 

This limit in turn shows (see Lemma |5] of Appendix HiHi that 
for every n, 

lim ==6''"^P(n) (63) 



almost surely and in distribution. A large deviation argument 
is also used to show that for every A and almost all z, 

i™o ^scalar ^) = CaLrC^S •^)- 

Combining this with the limits in Assumption |2] we will see 
(see Lemma I2] of Appendix lIVIl that 



u — >oo ^ 
~ ^scalar (^cff ,map ' 7p) 



(64) 



almost surely and in distribution. 

The equivalences ( |63] | and (l64l i are shown as rows in Fig. |7] 
As shown, they combine with the Replica MMSE Claim 
to prove the Replica MAP Claim. In equations instead of 
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diagrammatic form, the combination of limits is 



lim 6l'°''P(n) 



(a) 



(b) 



lim lim ^"(n) 



n — *oo u — ^oo 



lim lim 9'^{n) 



u — >oo n — >oo 



lim 01 



;alar 



^sca!ar('''off:inapi 7p) 



where all the limits are in distribution and (a) follows from 
(|63] l; (b) follows from Assumption [S] (c) follows from ( |62] i; 
and (d) follows from (|64] i. This proves ( |60l l and part (a) of the 
claim. 

Therefore, to prove the claim we prove the limit ( |63] | in 
Appendix Hill and the limit d64] i in Appendix 11 V I and show that 
the Umiting noise levels a^^ and jp satisfy the fixed-point 
equations in part (b) of the claim in Appendix FV] Before these 
results are given, we review in Appendix some requisite 
results from large deviations theory. 

Appendix II 
Large Deviations Results 

We begin by reviewing some standard results from large 
deviations theory. The basic result we need is Laplace's 
principle as described in [50]. 

Lemma 1 (Laplace's Principle): Let t^(x) be any measur- 
able function defined on some measurable subset V C R" 
such that 

exp(— (p(x)) dx < oo. (65) 



Then 



1 



lim — log 

u — >oo 11 



exp(— w(/j(x)) dx 



- ess inf (^(x). 

Given (/9(x) as in Lemma [T] define the probability distribu- 
tion 

-1 



9«(x) 



exp(— 7iiy9(x)) dx 



exp(— u(y9(x)). (66) 



We want to evaluate expectations of the form 



lim 



xeP 



g(M,x)g„(x)dx 



for some real-valued measurable function g{u, x). The follow- 
ing lemma shows that this integral is described by the behavior 
of g{u,K) in a neighborhood of the minimizer of (^(x). 

Lemma 2: Suppose that <p(x) and g{u,:x.) are real-valued 
measurable functions satisfying: 

(a) The function ip{x.) satisfies (l65T l and has a unique es- 
sential minimizer x G M" such that for every open 
neighborhood U of x, 

inf ¥'(x) > (/j(x). 

x^(7 

(b) The function g{u,x.) > and satisfies 

r ^ogg{u,x) 

lim sup sup — --^ yrrrr < 

for every open neighborhood U of x. 



(c) There exists a constant g^o such that for every e > 0, 
there exists a neighborhood U of x such that 



lim sup 



c/(w,x)g„(x) dx - go 



< e. 



Then, 



lim 1 g{u,yi)qu{yi)dyi^ goo- 

u — >oo J 

Proof: Due to item (c), we simply have to show that for 
any open neighborhood U of x. 



limsup / x)g„(x) dx = 0. 



To this end, let 



Z{u) = log / g{u,x)qu{x)dx. 

We need to show that Z{u) —oo as m ^ oo. Using the 
definition of (7„(x) in ( |66] |. it is easy to check that 



(67) 



where 

Zi{u) 

Z2{u) 

Now, let 



log 



Z{u) = Zi{u) - Z2{u), 

g{u, x) exp {—u{(p{x.) — <y5(x))) dx, 



log / cxp (-u((p(x) - v?(x))) dx. 
Jx.ev 



M = essinf (y5(x) — "^(x). 



By item (a), M > 0. Therefore, we can find a 6 > such that 
- A/(l - 5) + 3(5 < 0. (68) 
Now, from item (b), there exists a uq such that for all u > uq, 

Zi{u) < log/ exp(— u(l — (5)((y5(x) — (p(x))) dx. 

By Laplace's principle, we can find a ui such that for all 

U > Ml, 



5- inf (l-5)(^(x)-^(x)) 

xet/'^ 



Zi{u) < u 

= u{-M{l- 5) + 5). (69) 
Also, since x is an essential minimizer of iy9(x), 
ess inf (p(x) = "^(x). 

Therefore, by Laplace's principle, there exists a U2 such that 

for u > U2, 



Z2{u) > U 



-5 — ess inf ((y3(x) — '^{^)) 

xGX' 



'u5. (70) 



Substituting ( |69] l and (iTOl i into (l67T l we see that for u suffi- 
ciently large, 

Z{u) < u{-M{l -5)+6)+u5 < -u6. 



where the last inequality follows from ( 168b . This shows 
Z{u) ^ — CX3 as u ^ oo and the proof is complete. ■ 
One simple application of this lemma is as follows: 
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Lemma 3: Let (p{x} and h{x) be real-valued measurable 
functions such that the distribution (7„(x) satisfies the follow- 
ing: 

(a) The function (p(x) has a unique essential minimizer x 
such that for every open neighborhood U of x, 

(b) The function h{x) is continuous at x. 

(c) There exists a c > such for all x 7^ x, 

ip{x) - ip{x) > clog|/i(x) - /i(x)|. 

Then, 

lim / /i(x)(7„(x) dx = /i(x). 

Proof: We will apply Lemma |2] with g{u,x.) ~ \h{x.) — 
/i(x)| and goo — 0. Item (a) of this lemma shows that (p(x) 
satisfies item (a) in Lemma |2] 

To verify that item (b) of Lemma |2] holds, observe that item 
(c) of this lemma shows that for all x 7^ x, 

logff(u,x) ^ log|fe(x) - h{x)\ ^ ^ 

- ip{x) Lp{x) - ip{x) 

Hence, for any open neighborhood U of x, 

,. logg(-u,x) c 

limsupsup — — — — jTrrr < iim — = 0. 

U-.00 x^U u{ip(X) - ip{x)) u^oc u 

Now let us verify that item (c) of Lemma |2] holds. Let 
e > 0. Since /i(x) is continuous at x, there exists an open 
neighborhood U of x such that g{u, x) < e for all x e [/ and 
u. This implies that for all u, 

/ g{u,x)qu{x)dx < e / g„(x)dx 
Ju Ju 

which shows that g{u, x) satisfies item (c) of Lemma|2] Thus 

J h{x)qu{x) dx - h{x) 

{h{x) - h{x))qi,{x) dx 

< I \h{x) - h{x)\q^ix) dx 

< J g{u,x)quix)dX'^ 0, 

where the last limit is as m ^ 00 and follows from Lemma |2] 



Appendix III 

Evaluation of limu^oo x"(y) 

We can now apply Laplace's principle in the previous 
section to prove ( |63] ). We begin by examining the pointwise 
convergence of the MMSE estimator x"(y). 

Lemma 4: For every n. A, and S and almost all y, 

lim x"(y)=x"-P(y), 

u — >oo 

where x"(y) is the MMSE estimator in (|23]i and x'"''P(y) is 
the MAP estimator in ( fTSl l. 



Proof: The lemma is a direct application of Lemma [3] 
Fix n, y. A, and S and let 

^(x)-^||y-ASi/2xf + /(x). (71) 

The definition of x'°''P(y) in ([T8]l shows that 

x'"^P(y) ~ argmin(y5(x). 

Assumption |4] shows that this minimizer is unique for almost 
all y. Also (l22l i shows that 



Px|y(x I y ; p„,cr^ 



/ exp {—uip{x)) dx 



exp{—u(p{x)) 



where qti(x) is given in ( |66] l with V = X". Therefore, using 



x"(y) E (x I y ; cr^) = / xq„(x) dx 



(72) 



Now, to prove the lemma, we need to show that 

lim f"(y)=i7-P(y) 

for every component j = 1, . . . ,n. To this end, fix a compo- 
nent index j. Using (|72] |. we can write the jth component of 

x"(y) as 



h{x)qu(x) dx, 



xGA"" 



where h{x) = xj. The function h{x) is continuous. Also, 
using Assumption |5] it is straightforward to show that item (c) 
of Lemma [3] is satisfied for some c > 0. Thus, the hypotheses 
of Lemma [3] are satisfied and we have the limit 

lim x^iy) = hix"^^Piy))^xf^^{y). 

This proves the lemma. ■ 

Lemma 5: Consider the random vectors 0" (n) and 6/"i^p (rt) 
be defined in (15 6 al l and (I56bl i. respectively. Then, for all n. 



(73) 



lim r(n) =d"'^P{n) 



almost surely and in distribution. 

Proof: The vectors 6"{n) and 0™^P(n) are deterministic 
functions of x(n), A{n), S{n), and y. Lemma|4]shows that the 
limit (|73] | holds for any values of x(n), A(n), and S(n), and 
almost all y. Since y has a continuous probability distribution 
(due to the additive noise w in (O), the set of values where this 
limit does not hold must have probability zero. Thus, the limit 
(l73T l holds almost surely, and therefore, also in distribution. ■ 



Appendix IV 



Evaluation of lim„_ 



•00 -^scalar 



(z; A) 



We first show the pointwise convergence of the scalar 



MMSE estimator x'" 



;alar 



(^; A). 
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Lemma 6: Consider the scalar estimators ig^aiar (-^ ' ^) 



fined in (|57| i and 



^scalar 



(z ; A) defined in (EHi. For all A > 



and almost all z, we have the deterministic limit 



lim 



''scalar 



(-;A) = cr(^;A). 



Proof: The proof is similar to that of Lemma |4] Fix z and 
A and consider the conditional distribution Px\zix\z ; p„, X/u). 
Using ^ along with the definition of pu{x) in (|20] | and an 
argument similar to the proof of Lemma |4] it is easily checked 
that 

Px\z{x \ z; pu,X/u) = Quix), (74) 



where is given by (l66l l with V = X and 

ifiix) =Fix,z,\), (75) 
where F{x, z, A) is defined in (IZSl l. Using dSTl l and (fTol l. 

2^ scalar ! •^) = ^scalar(^ ! Pu,X/u) 

xPx\z{x I z; Pu,X/u)dx 



xex 



h{x)qu{x) dx, 



xex 



with /i(a;) = x. 



We can now apply Lemma [3] The definition of ; 
shows that 

CaL(^; A) = argmin^(a;). 

xex 



scalar 



(z; A) 



(76) 



Assumption |6] shows that for all A > and almost all z, this 
minimization is unique so 

> <^(Cafar(^; A))' 

for all X i™aL (^ ! •^)- ^1^°' ^^^"8 



lim </9(a;) 



(a) 



\x\ 



lim i^(a;,2,A) 



(6) . (c) 

> lim /(x) = oo 

\x\ — *oo 



(77) 



where (a) follows from (fTSl l: (b) follows from dZST l: and (c) 
follows from Assumption |5] Equations (|76] | and dTTl l show that 
item (a) of Lemma |3] is satisfied. Item (b) of Lemma |3] is also 
clearly satisfied since h{x) = x is continuous. 

Also, using Assumption |5] it is straightforward to show that 
item (c) of Lemma [3] is satisfied for some c > 0. Thus, all the 
hypotheses of Lemma |3] are satisfied and we have the limit 

lim Ccalar(^ ; A) = /^(CaLl^: ; A)) = ^^(^ ; A)' 
u — *oo 

This proves the lemma. ■ 
We now turn to convergence of the random variable 

Calar(^cff(")>"^p-cff("))- 

Lemma 7: Consider the random vectors 6'"(,aiar ("'^ ' 7) 
fined in ^ and CaL■('^^7) in dSSbli. Let a^s{u), 
ap_^ff{u), CTeff.map Ip defined in Assumption |2] 

Then the following limit holds: 



lim Calar(c^off(w),UCr^_„ff(w)) = Cafar('^cff ,map: 7p) 



(78) 



almost surely and in distribution. 

Proof: The proof is similar to that of Lemma |5] For any 
and 7 > 0, the vectors 7) and CLX^'.t) are 

deterministic functions of the random variables x ~ Po{x), 
s ~ Ps{s), and z given ( |59] | with v ~ A/'(0, 1). Lemma |6] 
shows that the limit 



lim,Calar('^^7)=C:L('^^7) 



(79) 



holds for any values of a^, 7, x, and s and almost all z. Also, 
if we fix X, s, and v, by Assumption |6] the function 

CaL(^ ; 7/s) = CaL(2; + ; il^) 

is continuous in 7 and ct^ for almost all values of v. Therefore, 
we can combine (|79] | with the limits in Assumption |2] to show 
that 

lim Calar(0-cff("),"Crp-cfi'(w)) = CaL(^cff,map> 7p) 

for almost all x and s and almost all z. Since z has a 
continuous probability distribution (due to the additive noise 
V in (|59]l), the set of values where this limit does not hold 
must have probability zero. Thus, the limit (|78] l holds almost 
surely, and therefore, also in distribution. ■ 



Appendix V 
Proof of the Fixed-Point Equations 

For the final part of the proof, we need to show that the 
limits cr^ff and 7^ in Assumption |2] satisfy the fixed-point 
equations ( l29l l. The proof is straightforward, but we just need 
to keep track of the notation properly. We begin with the 
following limit. 

Lemma 8: The following limit holds: 



^lim^ E [s mse(p„,po, Mp, z" 

^scalar A) I" 



where the expectations are taken over x ^ Pq{x) and s 
Ps{s), and z and are the random variables 



X + v/i"ti, 
X + ^/JIv, 



(80a) 
(80b) 



with V - 7V(0, 1) and /x" = a^^{u)/s, = CT^_^ff (u)/s, 

= c^cff,map/S' and A = 7^/5. 

Proof: Using the definitions of mse in ( fTTT i and 
(z; •) in dSTli, 



scalar 



mse(p„,po,Aip,M",^") 



N - inrr(^" ; Pu, t^;)\'PxUx I z" ; po, m") 



N - Ccaiar(^" ; A^p /") I (^^ | z" ; po, Ai") da;. 



Therefore, fixing s (and hence /i^ and /x"), we obtain the 
conditional expectation 

E [mse(p„,po,Mp,M",2") I s] 

= E[\x-x^,,,,,.{z-;fi;/u)\^\s], (81) 



16 



A REPLICA MAP CLAIM AND APPLICATIONS TO COMPRESSED SENSING 



where the expectation on the right is over x ^ Poi^) ™d 
given by ( |80a| ). 

Also, observe that the definitions /i" — a1^{u)/s and /i = 
"'off map/'5 ™d along with the limit in Assumption |2] show that 



lim /x" = /z. 



u — ^oo 
2 



(82) 



Similarly, since /i^ = '^p-csi'^)!^ ™d A = 7p/s, Assump- 
tion |2] shows that ^ 

lim ^ = A. (83) 

M— ►OO U 

Taking the limit as u — > cx3, 

lim E [smse(p„,po,Mp,M",2;")] 



(a) 


lim E \s\x 

u — ^oo 




(1) 


lim E \s\x 

u — >oo 


~ -^scalar i 1 ] i 


(£) 


lim E \s\x 

u — >oo 


~ ^scalar ! ' 


W 


lim E \s\x 


~ ^scalar ! ' 



where (a) follows from (ISTT i: (b) follows from dSSl i; (c) follows 
from ( |82] |. which implies that z" ^ z; and (d) follows from 
Lemma |6] ■ 

The previous lemma enables us to evaluate the limit of the 
MSB in ( |29al ). To evaluate the limit of the MSB in ( |29b] l. we 
need the following lemma. 

Lemma 9: Fix z and A, and let 

giu,x) ^U\X~ X\^, i i^afar(2; •^)- (^4) 

Also, let (f{x) be given by ( fTSl ) and qu{x) be given by (|66] | 
with 2? = A". Then, for any e > 0, there exists an open 
neighborhood U C A" of i; such that 



lim sup 



g{u, x)qu{x) dx — a'^{z, A) 



where (t^(z, A) is given in Assumption |6] 

Proof: The proof is straightforward but somewhat te- 
dious. We will just outline the main steps. Let S > 0. Using 
Assumption |5j one can find an open neighborhood U C X of 
X such that for all a; G [/ and u > 0, 

(j) {x,at{u)) < exp{~u{Lp{x) ~ (p{x))) < <j>{x,al{u)) , 

(85) 

where (j){x,a'^) is the unnormalized Gaussian distribution 

1 

(x, a") — exp 



and 



a\(u) = (1 +(5)a2(z,A)/w, 



al{u) = (1-(5)ct2(z,A)/w. 

Combining the bounds in (ISST i with the definition of qu{x) in 
(|66] | and the fact that U Q X shows that for all x e L/ and 

u>Q, 

-1 



qu{x) 



< 



-uv{x) 



{x, fj?, (u)) dx 



x,(j\{u)). 



Therefore, 

g{u,x)qu{x)dx 



xeu 
< 



u\x — x\'^qu{x) dx 

x£U 

1 -1 



{x, (u)) dx 



x£U 



u\x — (T^(w)) dx. 



(86) 



'xeu 

Now, it can be verified that 



lim / u^^^(l>{x, at {u)) dx = y'2T:{l - 5)(t{z, A) (87) 



and 



lim 



xi^U 



u^/'^\x - x\^4>{x,a\{u))dx 



= V27r(l + (5)3a(z,A)^ 
Substituting (|87]l and ((Mil into dHSJ shows that 

/■ (1 + (5)3/2 

lim sup / g{u,x)qu{x) dx < — -z — 

«-foo Jxeu 1 — 

A similar calculation shows that 



(88) 



a\z,X). 



. . f (1 - (5)3/2 

liminf / q{u, x)qu{x) dx > — a (z,A). 

Jxeu' 1 + " 

Therefore, with appropriate selection of 6, one can find a 
neighborhood U of x such that 

limsup / g{u,x)qu{x) dx — a'^{z, X) < e, 
u^oo Jxeu 

and this proves the lemma. ■ 
Using the above result, we can evaluate the scalar MSB. 
Lemma 10: Using the notation of Lemma [8j 

lim E [Msmse(p„,p„,/^" /i" z)] = E [s(t2(z, 7p/s)l . 
Proof: This is an application of Lemma |2] Bix z and A 
and define g{u,x) as in ( [84] i. As in the proof of Lemma |6] 
the conditional distribution Px\z{x \ z; pu,X/u) is given by 



(l74l i with (p{x) given by (fTST l. The definition of , 



'scak 



,.(^; A) 



in ( l24l i shows that a;™^|'^j.(z; A) minimizes <y3(x). Similar to 
the proof of Lemma |6] one can show that items (a) and (b) 
of Lemma |2] are satisfied. Also, Lemma |9] shows that item 
(c) of Lemma 12] holds with goo = o'^(z, A). Therefore, all the 
hypotheses of Lemma |2] are satisfied and 

lim / u|x - i™^f^^(z ; A)|2g„(x) dx = cr2(z, A), (89) 
''^°°Jxex 

for all A and almost all z. 
Now 

mse(p„,p„, A/u, A/w, z) 



{a) 
{b) 
{c) 



\^ - *™all?(^ ; Pu, >^/u)\^Px\z{x \ Z;pu, X/u) dx 



xex 



xex 



\x - K^uA^' ; Pu, A/M)|2q„(x) dx 

|a^-^scalar(2; A)|2g„(x)dx, 



(90) 



xex 
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where (a) is the definition of mse in ( fTTI ); (b) follows from 
( [74] i; and (c) follows from ( |57] ). Taking the limit of this 
expression 



lim 7i mse(p„,p„, A/w, A/u, z) 



(a) 

= iim 



(b) 



lim 

u — >oo 



(91) 



where (a) follows from ( |90] i; (b) follows from Lemma |6j and 
(c) follows from (|89] l. 

The variables and z in (ISOal i and ( ISObl i as well as /x" 
and are deterministic functions of x, v, s, and u. Fixing 
X, V, and s and taking the limit with respect to u we obtain 
the deterministic limit 



lim Mmse(pu,p„,^" /x" z") 

lim wmse(p„,p„,f7p_cff crp_eff(M)/s,z") 



(a) 
(b) 
(J 
(d) 



= lim CT^(z",ucrp_^ff('u)/s) 



lim CT^(z,UCTp_^g(u)/s) 
cr^(z,7p/s), 



(92) 



where (a) follows from the definitions of /x" and /i^ in 
Lemma jS) (b) follows from (|9T| i; (c) follows from the limit 
(proved in Lemma[8]l that z" ^ z as m ^ oo; and (d) follows 
from the limit in Assumption |2] 

Finally, observe that for any prior p and noise level /x, 

mse(p,p,/x,Ai,z) < /i, 

since the MSE error must be smaller than the additive noise 
level /i. Therefore, for any u and s, 

iismse(p„,p„,^p,/ip,z") < MS/Xp = ual^{u), 

where we have used the definition /^^ — <T'^g{u)/s. Since 
W(T^ff(u) converges, there must exists a constant M > such 
that 

iismse(p„,p„,^p,/Xp,z") < its^p < M, 

for all u, s and z". The lemma now follows from applying 
the Dominated Convergence Theorem and taking expectations 
of both sides of (|92]l. ■ 

We can now show that the limiting noise values satisfy the 
fixed-point equations. 

Lemma 11: The limiting effective noise levels cr^^^^ and 
7p in Assumption |2] satisfy the fixed-point equations ( |29a| i and 
( B9bb . 

Proof: The noise levels cr1^{u) and cr^_^ff{u) satisfy the 
fixed-point equations ( |13a| ) and ( |13b| ) of the Replica MMSE 
Claim with the postulated prior Ppost = and noise level 
(TpQgj — ^/u. Therefore, using the notation in Lemma [HJ 

als{u) = al+p-E [s mse(p„,po, Mp, m", 2")] (93a) 
wo'p_cff(u) = 7 + /5E [-usmse(p„,p„,/i^,Ai^,z")](93b) 



where (as defined in Lemma [8]), /i" = o'^jf(u)/s and fi^ — 
(Tp_j,ff (u)/s and the expectations are taken over s ^ Ps{s), 
X ~ Po{x), and z" in (ISOal l. 
Therefore, 

2 (1) ,. 2 / N 



(fc) 



ao^+/3E[4T-Cr(-;A)P], 

where (a) follows from the limit in Assumption |2j (b) follows 
from ( |93at ; and (c) follows from Lemma |8] This shows that 
( |29a| l is satisfied. 
Similarly, 



cr^ +/3E [s mse(p„,po,Mp,M",^")] 



7p 



(a) 



lim ucr. 

u — >oo 



p-off(w) 



= 7 + /?E [smse(p„,p„,Ai^,^^,z")] 

where (a) follows from the limit in Assumption |2j (b) follows 
from (|93bt ; and (c) follows from Lemma [TO] This shows that 
(I29bb is satisfied. ■ 
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